Introduction

This project explores the prevalence of smoking in the United States and the emerging vaping trends among youth. These trends are of great importance to health researchers, medical professionals and governments as they are a key component in understanding the overall health of a population. We aim to understand where the United States sits globally in terms of smoking trends, understand what patterns have emerged in the past few decades, and moreover where is the US likely heading with smoking and vaping trends among its youth.

Questions/Aims

The first question to be asked is what are the global trends in cigarette smoking rates? This question aims to establish the wider picture of overall trends in recent years, and reveal the US’s place in the wider trends.

Next we wish to again look at the United States as a whole and ask what have been the trends in smoking rates in the United States from 1995 to 2010? We want to facet by geographic region in the Northeast, Midwest, South and West, and also by smoking frequency, exploring daily smokers, weekly smokers, former smokers, and non-smokers?

Furthermore, we wish to look specifically at the years from 2005 to 2009 and determine what are the largest costs to the state associated with cigarette smoking. Regarding trends in sex, we want to explore the percentage of male deaths from smoking in this time period is the same with that of females, when averaged among the states. We also wish to explore if there is a correlation between the overall cost to a state, and the number of individuals dying from smoking related illnesses.

Finally, we wish to look at the trends in vaping and e-cigarette usage among youth from 2011-2020 and see if we can predict their smoker status based on their age, sex, vaping status and frequency.

Results

Discussion

Overall, there are interesting trends emerging both with regards to smoking and vaping. In exploring our first question of where the United States stood globally on smoking rates, we can see that they are far from the highest smoking nation and currently contain roughly a third of the smoking population percentage as the currently leading smokers. Furthermore we observed that globally trends in smoking rates are consistently decreasing.

When looking at the patterns in the United States throughout the late 1990s and 2000s, we can see this overall downward trend in smoking rates. It is unsurprising to see this, given the wide scale public education campaign that has occurred through the late 20th century in educating people as to the harmful effects of smoking. Furthermore, as more and more information emerged on the long-term impacts of smoking, it became easier to directly identify certain illnesses as a direct result of cigarette consumption. Smoking no longer is the ‘norm’ with only roughly 20% of US adults smoking and no longer glorified on film and television, given the heavy regulation that cigarettes have. So it’s unsurprising that the number of non-smokers is increasing as new generations shun the habit, and older generations unfortunately pass away from smoking related illnesses.

Furthermore it is unsurprising that the US is now experiencing such high economic burden from smoking related illnesses, given the delayed time between the high point of smoking in the 1950s and the time needed for individuals to develop smoking related illnesses. Given the direct correlation between individuals getting sick and dying from smoking and the cost to hospitalize and care for individuals suffering from this, both the log-log and KNN models we’re very strong in predicting the average cost for a state given the number of individuals dying from smoking related illnesses. Such models would be very useful to US State Governments so that they may allocate appropriate funding to healthcare sectors which manage smoking related illnesses. Furthermore, as prescription drugs are included in the costings, these models are also valuable to pharmaceutical companies that can adjust their production of medication to match the demand.

Finally, the large consumption of emerging smoking technologies including vapes by youth is a concern as there have not been enough longitudinal studies on the long term health effects on vaping. As highlighted above, e-cigarettes were first introduced into the market in 2003, and so there is at most 20 years of data available on the consumption of e-cigarettes and vaping products. Any data claiming that vapes are ‘healthier’ than cigarettes cannot then account what long term consequences may occur. In regards to detecting whether a youth has tried cigarettes through, among several predictors, whether or not they have vaped before, both the logistic model and the classification trees had an average performance. Both models were better at detecting non-smoker than non-smokers, despite the undersampling employed to ensure an even amount of both in the test data. It may be because the response variable, whether an individual has ever smoked or not, is too general and does not account for the complexity of circumstances or frequencies to which individuals will consume vapes or cigarettes. Numerous other predictors related to smoking and its prevalence, such as social circles, parental smoking status and socio-economic background may yield better results that the models used. Alternatively, there may just not be any link strong between vaping and smoking.

Conclusion

The United States is a microcosm of larger global trends in shrinking smoking rates. It is currently enduring its most challenging era of smoking, as it deals with the consequences of long term mass cigarette consumption, where the costs of such a challenge can be effectively modeled and accurately predicted by either linear regression or KNN. Fortunately the US is also seeing decreases in smoking populations among every region, and new generations emerging that are taking up smoking. Vaping among youth is on the rise and a situation that will be closely observed as more and more data emerges on the long term effects of vaping. While debate continues as to whether vaping has a causal relationship with smoking, our logistic regression and classification tree models, while decent at detecting non-smokers, only had moderate success detecting smokers.

Dataset Citations

\(\textit{WHO Tobacco and Smoking Data 2008-2018}\). Kaggle: World Health Organization; 2008-2018. Licence: CC BY-NC-SA 3.0 IGO. Data accessible at: https://www.kaggle.com/datasets/ozgurdogan646/who-tobacco-and-smoking-data-20082018?select=SmokingAndTobaccoData2008.csv

\(\textit{BRFSS Prevalence and Trends Data: Tobacco Use - Four Level Smoking Data for 1995-2010}\), Centers for Disease Control and Prevention (CDC). Behavioral Risk Factor Surveillance System. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2015. Data accessible at: https://data.cdc.gov/Smoking-Tobacco-Use/BRFSS-Prevalence-and-Trends-Data-Tobacco-Use-Four-/8zak-ewtm

Smoking-Attributable Mortality, Morbidity, and Economic Costs (SAMMEC) - Smoking-Attributable Expenditures (SAE), Centers for Disease Control and Prevention (CDC). Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2020. Data accessible at: https://chronicdata.cdc.gov/Health-Consequences-and-Costs/Smoking-Attributable-Mortality-Morbidity-and-Econo/ezab-8sq5

Smoking-Attributable Mortality, Morbidity, and Economic Costs (SAMMEC) - Smoking-Attributable Mortality (SAM), Centers for Disease Control and Prevention (CDC). Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2020. Data accessible at: https://chronicdata.cdc.gov/Health-Consequences-and-Costs/Smoking-Attributable-Mortality-Morbidity-and-Econo/4yyu-3s69

\(\textit{National Youth Tobacco Survey (NYTS)}\), Centers for Disease Control and Prevention (CDC). Office on Smoking and Health, National Center for Chronic Disease Prevention and Health Promotion. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2022. Data accessible at: https://www.cdc.gov/tobacco/data_statistics/surveys/nyts/data/index.html

\(\textit{State Intercensal Tables: 2000-2010}\), United States Census Bureau, 2021. Data accessible at: https://www.census.gov/data/tables/time-series/demo/popest/intercensal-2000-2010-state.html